Reducing Data Movement Costs Using Energy-Efficient, Active Computation on SSD
نویسندگان
چکیده
Modern scientific discovery often involves running complex application simulations on supercomputers, followed by a sequence of data analysis tasks on smaller clusters. This offline approach suffers from significant data movement costs such as redundant I/O, storage bandwidth bottleneck, and wasted CPU cycles, all of which contribute to increased energy consumption and delayed end-toend performance. Technology projections for an exascale machine indicate that energy-efficiency will become the primary design metric. It is estimated that the energy cost of data movement will soon rival the cost of computation. Consequently, we can no longer ignore the data movement costs in data analysis. To address these challenges, we advocate executing data analysis tasks on emerging storage devices, such as SSDs. Typically, in extreme-scale systems, SSDs serve only as a temporary storage system for the simulation output data. In our approach, Active Flash, we propose to conduct in-situ data analysis on the SSD controller without degrading the performance of the simulation job. By migrating analysis tasks closer to where the data resides, it helps reduce the data movement cost. We present detailed energy and performance models for both active flash and offline strategies, and study them using extreme-scale application simulations, commonly used data analytics kernels, and supercomputer system configurations. Our evaluation suggests that active flash is a promising approach to alleviate the storage bandwidth bottleneck, reduce the data movement cost, and improve the overall energy efficiency.
منابع مشابه
Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines
Modern scientific discovery is increasingly driven by large-scale supercomputing simulations, followed by data analysis tasks. These data analyses are either performed offline, on smaller-scale clusters, or in-situ, on the supercomputer itself. Both of these strategies are rife with storage and I/O bottlenecks, energy inefficiencies due to increased data movement, and increased time to solution...
متن کاملGullfoss: Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources
High-end computer systems increasingly rely on heterogeneous computing resources. For instance, a datacenter server might include multiple CPUs, high-end GPUs, PCIe SSDs, and high-speed networking interface cards. All of these components provide computing resources and operate at a high bandwidth. Coordinating the movement of data and scheduling computation across these resources is a complex t...
متن کاملImproving Node-Level MapReduce Performance Using Processing-in-Memory Technologies
Processing-in-Memory (PIM) is the concept of moving computation as close as possible to memory. This decreases the need for the movement of data between central processor and memory system, hence improves energy efficiency from the reduced memory traffic. In this paper we present our approach on how to embed processing cores in 3D-stacked memories, and evaluate the use of such a system for Big ...
متن کاملBF-Tree: Approximate Tree Indexing
The increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads like social and service monitoring often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of...
متن کاملA second-order stochastic dominance portfolio efficiency measure
In this paper, we introduce a new linear programming second-order stochastic dominance (SSD) portfolio efficiency test for portfolios with scenario approach for distribution of outcomes and a new SSD portfolio inefficiency measure. The test utilizes the relationship between CVaR and dual second-order stochastic dominance, and contrary to tests in Post [14] and Kuosmanen [7], our test detects a ...
متن کامل